Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

gh-140795: fetch thread state once on fast path for critical sections#141406

Merged
kumaraditya303 merged 3 commits intopython:mainfrom
kumaraditya303:critical-section
Nov 21, 2025
Merged

gh-140795: fetch thread state once on fast path for critical sections#141406
kumaraditya303 merged 3 commits intopython:mainfrom
kumaraditya303:critical-section

Conversation

@kumaraditya303
Copy link
Contributor

@kumaraditya303 kumaraditya303 commented Nov 11, 2025

Currently on the fastpath for critical section where the objects are not locked, the thread state is fetched twice once while acquiring it and once while releasing it. This PR optimizes it to fetch it once and store it (in a temp variable on stack) so that on the fastpath thread state is fetched once.
This should help performance in shared modules such as ssl where thread state access is slower because of extra function call to _PyThreadState_GetCurrent.

@colesbury
Copy link
Contributor

How much of a difference does this make to asyncio_tcp_ssl?

@kumaraditya303
Copy link
Contributor Author

How much of a difference does this make to asyncio_tcp_ssl?

It is ~5% faster on macOS with this change.

@kumaraditya303 kumaraditya303 merged commit 49ff8b6 into python:main Nov 21, 2025
50 checks passed
@kumaraditya303 kumaraditya303 deleted the critical-section branch November 21, 2025 14:19
StanFromIreland pushed a commit to StanFromIreland/cpython that referenced this pull request Dec 6, 2025
ashm-dev pushed a commit to ashm-dev/cpython that referenced this pull request Dec 8, 2025
encukou added a commit to encukou/cpython that referenced this pull request Mar 17, 2026
pythonGH-141406 improved performance by only fetching thread state once
and storing it in a variable on the stack.

This instead puts the thread state in the PyCriticalState struct
(also a temp variable on the stack), bringing the public and private
implementations closer together.
@encukou
Copy link
Member

encukou commented Mar 17, 2026

I don't understand why this is only done for Py_BUILD_CORE code. Is there a reason to exclude non-stdlib users of critical sections?

Putting the thread state in a temp stack variable for all users seems to work well: https://github.com/python/cpython/pull/146066/files

@colesbury
Copy link
Contributor

I wouldn't necessarily expect it to help non-stdlib users. There are basically three cases:

  1. Python interpreter core (main executable)
  2. Python extensions that don't build with Py_BUILD_CORE
  3. Python stdlib extensions that use Py_BUILD_CORE (e.g., _ssl)

In the main executable, everything gets inlined and the access to _Py_tss_tstate is typically very fast. Additionally, in some cases, the compiler can remove redundant loads of _Py_tss_tstate.

In non-stdlib extensions, the calls are not inlined. The access to _Py_tss_tstate is again typically in the main executable, so it's very fast (often a single memory load).

In extensions like _ssl, the critical section calls are inlined, but now the access to _Py_tss_tstate is happening in a shared library. The access to TLS from a shared library usually requires a function call, so it's slower. That's the unique case that this PR was meant to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants